PyTables: Processing And Analyzing Extremely Large Amounts Of Data In Python

نویسندگان

  • Francesc Alted
  • Mercedes Fernández-Alonso
چکیده

Processing large amounts of data is a must for people working in such fields of scientific applications as Meteorology, Oceanography, Astronomy, Astrophysics, Experimental Physics or Numerical simulation to name only a few. Existing relational or object-oriented databases usually are good solutions for applications in which multiple distributed clients need to access and update a large centrally managed database (e.g., a financial trading system). However, they are not optimally designed for efficient read-only database queries to pieces, or even single attributes, of objects, a requirement for processing data in many scientific fields such as the ones mentioned above. This paper describes PyTables [ 1], a Python library that addresses this need, enabling the end user to manipulate easily scientific data tables and regular homogeneous (such as Numeric [ 2] arrays) Python data objects in a persistent, hierarchical structure. The foundation of the underlying hierarchical data organization is the excellent HDF5 [ 3] C library.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Common Spatial Patterns Feature Extraction and Support Vector Machine Classification for Motor Imagery with the SecondBrain

Recently, a large set of electroencephalography (EEG) data is being generated by several high-quality labs worldwide and is free to be used by all researchers in the world. On the other hand, many neuroscience researchers need these data to study different neural disorders for better diagnosis and evaluating the treatment. However, some format adaptation and pre-processing are necessary before ...

متن کامل

Hardware-accelerated interactive data visualization for neuroscience in Python

Large datasets are becoming more and more common in science, particularly in neuroscience where experimental techniques are rapidly evolving. Obtaining interpretable results from raw data can sometimes be done automatically; however, there are numerous situations where there is a need, at all processing stages, to visualize the data in an interactive way. This enables the scientist to gain intu...

متن کامل

Application of Benford’s Law in Analyzing Geotechnical Data

Benford’s law predicts the frequency of the first digit of numbers met in a wide range of naturally occurring phenomena. In data sets, following Benford’s law, numbers are started with a small leading digit more often than those with a large leading digit. This law can be used as a tool for detecting fraud and abnormally in the number sets and any fabricated number sets. This can be used as an ...

متن کامل

Feeding a Large-scale Physics Application to Python

We describe our experiences using Python with the SPaSM molecular dynamics code at Los Alamos National Laboratory. Originally developed as a large monolithic application for massively parallel processing systems, we have used Python to transform our application into a flexible, highly modular, and extremely powerful system for performing simulation, data analysis, and visualization. In addition...

متن کامل

InRaDoS: An internal radiation dosimetry computer program

Introduction: Internal radiation dosimetry is important from a radiation protection point of view and can help to optimize the radiation dose delivered to the workers, public, and patients. It has a rather simple protocol but needs a large amount of data. Therefore, it is difficult to do on a routine basis. The use of computer programs makes internal radiation dosimetry simpler...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003